Conversation
❌ 2 Tests Failed:
View the top 3 failed test(s) by shortest run time
To view more test analytics, go to the Test Analytics Dashboard |
🔄 Flaky Test DetectedAnalysis: Both failures are caused by ✅ Automatically retrying the workflow |
🔄 Flaky Test DetectedAnalysis: Both failures stem from PostgreSQL connection termination (SQLSTATE 57P01 — "terminating connection due to administrator command") and an active replication slot blocking teardown (SQLSTATE 55006), which are transient infrastructure/resource-contention issues in the concurrent e2e test environment, not code logic bugs. ✅ Automatically retrying the workflow |
🔄 Flaky Test DetectedAnalysis: Both failures are transient: one is a snapshot status timeout (async timing issue) and the other is a PostgreSQL connection terminated by admin shutdown (SQLSTATE 57P01), neither indicating a code regression. ✅ Automatically retrying the workflow |
|
|
||
| // conditionallyRetryableExceptions are error codes that are only retryable | ||
| // when the error message contains one of the specified substrings | ||
| var conditionallyRetryableExceptions = map[chproto.Error][]string{ |
There was a problem hiding this comment.
Just an idea, how about retryableExceptionSubstrings? Wouldn't need an explanation this way
🔄 Flaky Test DetectedAnalysis: The test failed due to a transient PostgreSQL connection drop (SQLSTATE 57P01 — admin_shutdown), where the catalog DB connection was terminated by an administrator command mid-test, indicating a CI infrastructure issue rather than a code bug. ✅ Automatically retrying the workflow |
🔄 Flaky Test DetectedAnalysis: TestGenericBQ/Test_Simple_Flow timed out waiting for STATUS_SNAPSHOT to complete — a classic transient timeout in an e2e test that depends on BigQuery and external services, with only 2 failures out of 2348 tests and the codebase itself flagging BigQuery tests as flaky under high concurrency. ✅ Automatically retrying the workflow |
This PR fix two things:
code: 1001, message: std::__1::ios_base::failure: ios_base::clear: unspecified iostream_category error. This is a transient networking error that can happen in s3 during read and usually recovers on single retry. We want to also categorize it as retryable to avoid starting over the snapshot after a single error.